feat: RFC 6570 URI templates with operator-aware security#2356
feat: RFC 6570 URI templates with operator-aware security#2356
Conversation
Adds `mcp.shared.uri_template.UriTemplate`, a standalone utility for
parsing, expanding, and matching RFC 6570 URI templates. Supports
Levels 1-3 fully plus path-style explode (`{/var*}`, `{.var*}`,
`{;var*}`).
Matching enforces structural integrity: decoded values are validated
against their operator's permitted character set. A simple `{var}`
whose decoded value contains `/` is rejected, preventing `%2F`
smuggling while still allowing `/` in `{+var}` where it is
intentional. This is the operator-aware generalization of the
post-decode check for encoded path separators.
Also fixes the existing regex-escaping gap where template literals
like `.` were treated as regex wildcards.
The utility lives in `shared/` so it is usable from both client code
(expand) and server code (match), including lowlevel server
implementations that do not use MCPServer.
Adds `mcp.shared.path_security` with three standalone utilities for defending against path-traversal attacks when URI template parameters flow into filesystem operations: - `contains_path_traversal()` — base-free component-level check for `..` escapes, handles both `/` and `\` separators - `is_absolute_path()` — detects POSIX, Windows drive, and UNC absolute paths (which silently discard the base in `Path` joins) - `safe_join()` — resolve-and-verify within a sandbox root; catches `..`, absolute injection, and symlink escapes These are pure functions usable from both MCPServer and lowlevel server implementations. `PathEscapeError(ValueError)` is raised by `safe_join` on violation.
Refactors the internal `ResourceTemplate` to use the RFC 6570
`UriTemplate` engine for matching, and adds a configurable
`ResourceSecurity` policy for path-safety checks on extracted
parameters.
`ResourceTemplate.matches()` now:
- Delegates to `UriTemplate.match()` for full RFC 6570 Level 1-3
support (plus path-style explode). `{+path}` can match
multi-segment paths.
- Enforces structural integrity: `%2F` smuggled into a simple
`{var}` is rejected.
- Applies `ResourceSecurity` policy: path traversal (`..` components)
and absolute paths rejected by default, with per-parameter
exemption available.
The `@mcp.resource()` decorator now parses the template once at
decoration time via `UriTemplate.parse()`, replacing the regex-based
param extraction that couldn't handle operators like `{+path}`.
Malformed templates surface immediately with a clear
`InvalidUriTemplate` including position info.
Also fixes the pre-existing bug where template literals were not
regex-escaped (a `.` in the template acted as a wildcard).
Adds `resource_security` to `MCPServer.__init__` and a per-resource
`security` override to the `@resource()` decorator. Templates inherit
the server-wide policy unless overridden.
Exports `ResourceSecurity` and `DEFAULT_RESOURCE_SECURITY` from
`mcp.server.mcpserver` for user configuration.
Usage:
# Server-wide relaxation
mcp = MCPServer(resource_security=ResourceSecurity(reject_path_traversal=False))
# Per-resource exemption for non-path parameters
@mcp.resource(
"git://diff/{+range}",
security=ResourceSecurity(exempt_params=frozenset({"range"})),
)
def git_diff(range: str) -> str: ...
Documents the RFC 6570 support, security hardening defaults, and opt-out configuration for the resource template rewrite. Grouped with the existing resource URI section.
Changes the type from frozenset[str] to collections.abc.Set[str] so
users can write exempt_params={"range"} instead of
exempt_params=frozenset({"range"}). The default factory stays
frozenset for immutability.
… usage Adds docs/server/resources.md as the first page under the planned docs/server/ directory. Covers static resources, RFC 6570 template patterns, the built-in security checks and how to relax them, the safe_join pattern for filesystem handlers, and equivalent patterns for low-level Server implementations. Creates the docs/server/ nav section in mkdocs.yml.
RFC 6570 requires repeated variables to expand to the same value. Enforcing this at match time would require backreferences with potentially exponential cost. We reject at parse time instead, following the recommendation in #697. Previously a template like {x}/{x} would parse and silently return only the last captured value on match.
Adds coverage for encoding-based attack vectors across both security layers: Layer 1 (structural integrity in UriTemplate.match): - Double-encoding %252F decoded once, accepted as literal %2F - Multi-param template with one poisoned value rejects whole match - Value decoding to only the forbidden delimiter rejected Layer 2 (ResourceSecurity traversal check): - %5C backslash passes structural, caught by traversal normalization - %2E%2E encoded dots pass structural, caught by traversal check - Mixed encoded+literal slash fails at regex before decoding
Cheap heuristic for distinguishing concrete URIs from templates
without full parsing. Returns True if the string contains at least
one {...} pair. Does not validate; a True result does not guarantee
parse() will succeed.
Matches the TypeScript SDK's UriTemplate.isTemplate() utility.
Adds a max_uri_length keyword argument (default 64 KiB) that returns None for oversized inputs before regex evaluation. Guards against resource exhaustion from pathologically long URIs, particularly on stdio transport where there is no inherent message size limit. Consistent with the existing max_length/max_expressions limits on parse(); the default is exported as DEFAULT_MAX_URI_LENGTH.
Ports three test scenarios from the TypeScript SDK:
- Repeated-slash literals (///{a}////{b}////) preserved exactly and
rejected when slash count differs
- Trailing extra path component rejected (/users/{id} vs
/users/123/extra); guards against a refactor from fullmatch to
match or search
- Adjacent variables with prefix-overlapping names ({var}{vara});
documents the greedy capture split and confirms positional groups
map to the correct dict keys
Null bytes pass through Path construction but fail at the syscall boundary with a cryptic 'embedded null byte' error. Rejecting in safe_join gives callers a clear PathEscapeError instead, and guards against null-byte injection when the path is used for anything other than immediate file I/O (logging, subprocess args, config).
The @resource() decorator now classifies resources based solely on whether the URI contains template variables, not on whether the handler has parameters. Previously, a handler taking only a Context parameter on a non-template URI would register as a zero-variable template. The template matched with an empty dict, which the walrus check in resource_manager treated as falsy, making the resource permanently unreachable. This has never worked. Now such a handler errors at decoration time with a clear message noting that Context injection for static resources is planned but not yet supported. Handlers with non-Context parameters on non-template URIs also get a clearer error than the old 'Mismatch' message. Also changes the resource_manager walrus check to compare against None explicitly, as defense-in-depth against any future case where matches() legitimately returns an empty dict.
Three fixes to the path-style parameter operator:
Non-explode {;id}: the regex used =? (optional equals), which let
{;id} match ;identity=john by consuming 'id' as a prefix. Added a
lookahead asserting the name ends at = or a delimiter.
Explode {;keys*} matching: captured segments included the name=
prefix (returning ['keys=a', 'keys=b'] instead of ['a', 'b']) and
did not validate the parameter name (so ;admin=true matched). Now
strips the prefix and rejects wrong names in post-processing.
Explode {;keys*} expansion: emitted name= for empty items. RFC
3.2.7's ifemp rule says ; omits the = for empty values, so
['a', '', 'b'] now expands to ;keys=a;keys;keys=b.
All three are covered by new round-trip tests including the
empty-item edge case.
UriTemplate.match() no longer rejects decoded values containing
characters like /, ?, #, &. It now faithfully returns whatever
expand() would have encoded, so match(expand(x)) == x holds for all
inputs.
The previous check broke round-trip for legitimate values (a&b
expanded to a%26b but match rejected it) and was inconsistent with
every other MCP SDK. The spec's own canonical example file:///{path}
requires multi-segment values; Kotlin and C# already decode without
rejection and document handler-side validation as the security
contract.
Path-safety validation remains in ResourceSecurity (configurable) and
safe_join (the gold-standard check). The %2F path-traversal attack
vector is still blocked: ..%2Fetc%2Fpasswd decodes to ../etc/passwd,
which contains_path_traversal rejects. Tests confirm this end-to-end.
This aligns us with Kotlin's documented model: decode once, pass to
handler, handler validates.
UriTemplate.match() now handles trailing {?...}/{&...} expressions via
urllib.parse.parse_qs instead of positional regex. Query parameters
are matched order-agnostic, partial params are accepted, and
unrecognized params are ignored. Parameters absent from the URI stay
absent from the result so downstream function defaults apply.
This restores the round-trip invariant for query expansion: RFC 6570
skips undefined vars during expand(), so {?q,lang} with only q set
produces ?q=foo. Previously match() rejected that output; now it
returns {'q': 'foo'}.
Templates with a literal ? in the path portion (?fixed=1{&page})
fall back to strict regex matching since the URI split won't align
with the template's expression boundary.
The docs example at docs/server/resources.md (logs://{service}{?since,level}
with Python defaults) now works as documented.
Four small corrections:
Varname grammar: the RFC grammar requires dots only between varchar
groups, so {foo..bar} and {foo.} are now rejected. Previously the
regex allowed any dot placement after the first char.
Adjacent explodes: previously only same-operator adjacent explodes
({/a*}{/b*}) were rejected. Different operators ({/a*}{.b*}) are
equally ambiguous because the first operator's character class
typically includes the second's separator, so the first explode
greedily consumes both. All adjacent explodes are now rejected; a
literal or non-explode variable between them still disambiguates.
Documented the inherent ambiguity of multi-var reserved expressions
({+x,y} with commas in values) and the intentional tradeoff that
{+var} match stops at ? and # so {+path}{?q} can separate correctly.
…aptures
Two RFC-conformance fixes:
Reserved expansion ({+var}, {#var}) now passes through existing %XX
pct-triplets unchanged per RFC 6570 section 3.2.3, while still
encoding bare %. Previously quote() double-encoded path%2Fto into
path%252Fto. Simple expansion is unchanged (still encodes %
unconditionally).
Match patterns now use * instead of + quantifiers so defined-but-empty
values round-trip. RFC says empty variables still emit the operator
prefix: {#section} with section='' expands to '#', but the previous
.+ pattern could not match the empty capture after it. All eight
operators now consistently accept empty values.
The quantifier change affects adjacent-unrestricted-var resolution:
{a}{b} matching 'xy' now gives {a: 'xy', b: ''} (greedy first-wins)
instead of the previous {a: 'x', b: 'y'} (artifact of + backtracking).
Adjacent vars without a separating literal are inherently ambiguous
either way; a literal between them ({a}-{b}) still disambiguates.
Replaces tuple[X, ...] with list[X] throughout UriTemplate internals and public API. The tuples were defensive immutability nobody needed: the dataclass fields are compare=False so they do not participate in hash/eq, and the public properties now return fresh copies so callers cannot mutate internal state. Helper function parameters take Sequence[X] where they only iterate; returns are concrete list[X]. The only remaining tuples are the fixed-arity (pair) return types on _parse and _split_query_tail, which is the correct use of tuple.
The resource template migration section was documenting new features alongside behavior changes. Trimmed to the four actual breakages: path-safety checks now applied by default, template literals regex- escaped, lenient query matching, and parse-time validation. New capabilities and best-practice guidance moved to the Resources doc via a link at the end.
Adds a sentence on lenient query matching (order-agnostic, extras
ignored, defaults apply) after the logs example.
Adds the component-based clarification for the .. check so users know
values like HEAD~3..HEAD and v1.0..v2.0 are unaffected.
Fixes the exempt_params motivating example in both resources.md and
migration.md. The previous git://diff/{+range} example used
HEAD~3..HEAD, which the component-based check already passes without
exemption. Replaced with inspect://file/{+target} receiving absolute
paths, which genuinely requires the opt-out.
The [^?#]* match pattern for {+var} and {#var} overlaps with every
other operator's character class. When a trailing literal fails to
match, the regex engine backtracks through O(n) split points with
O(n) rescanning each, yielding quadratic time. A 64KB payload (the
default max_uri_length) against {+prefix}{/path*}/END consumed ~25s
CPU per request.
Two conditions trigger the quadratic case, now both rejected at parse
time:
1. {+var} immediately adjacent to any expression ({+a}{b}, {+a}{/b*})
2. Two {+var}/{#var} anywhere in the template, even with a literal
between them ({+a}/x/{+b}) since [^?#]* matches the literal too
What remains permitted:
- {+path} at end of template (the flagship use case)
- {+path}.txt or {+path}/edit (literal suffix, linear backtracking)
- {+path}{?v,page} (query expressions stripped before pattern build)
- {+a}/sep/{b} (literal + bounded expression, disambiguated)
The _check_adjacent_explodes function is generalized to
_check_ambiguous_adjacency covering both explode adjacency and the
new reserved-expansion constraints.
migration.md: added note that static URIs with Context-only handlers now error at decoration time. The pattern was previously silently unreachable (the resource registered but could never be read); now it surfaces early. Duplicate-variable-names rejection was already covered in the malformed-templates paragraph. resources.md: clarified that the .. check is depth-based (rejects values that would escape the starting directory, so a/../b passes). Changed template reference table intro from 'what the SDK supports' to 'the most common patterns' since the table intentionally omits the rarely-used fragment and path-param operators. test_uri_template.py: corrected the stray-} test comment. RFC 6570 section 2.1 strictly excludes } from literals; we accept it for TypeScript SDK parity, not because the RFC is lenient.
Adds two no-match cases for the lenient-query code path in
UriTemplate.match(): path regex failing when query vars are present,
and ; explode name mismatch in the path portion before a {?...}
expression.
Adds a passing case to test_resource_security_default_rejects_traversal
so the handler body executes (the test previously only sent rejected
URIs, leaving the handler uncovered).
Replaces the _make helper's unreachable return with
raise NotImplementedError since those tests only exercise matches().
…{&var}
Three fixes to the two-phase query matching path:
Replaced parse_qs with a manual &/= split using unquote(). parse_qs
follows application/x-www-form-urlencoded semantics where + decodes
to space, but RFC 6570 and RFC 3986 treat + as a literal sub-delim.
A client sending ?q=C++ previously got 'C '; the path-portion
decoder (unquote) already handled this correctly, so the two code
paths disagreed.
Fragment is now stripped before splitting on ?. A URI like
logs://api?level=error#section1 previously returned
level='error#section1' via the lenient path while the strict-regex
path correctly stopped at #.
_split_query_tail now requires the trailing tail to start with a
{?...} expression. A standalone {&page} expands with an & prefix
(no ?), so partition('?') found no split and the path regex failed.
Such templates now fall through to strict regex which handles them
correctly. Also extends the path-portion check to bail on {?...}
expressions left in the path, not just literal ?.
match() docstring: qualified the round-trip claim with the RFC
section 1.4 caveat that values containing their operator's separator
unencoded do not round-trip (e.g. {.ext} with 'tar.gz').
resource() decorator docstring: removed the 'or the function has
parameters' clause which commit 674783f made stale; template/static
is now decided purely by URI variables.
Added DEFAULT_MAX_TEMPLATE_LENGTH, DEFAULT_MAX_EXPRESSIONS, and
DEFAULT_MAX_URI_LENGTH to __all__ to match the stated intent that
these are part of the public API.
Moved DEFAULT_MAX_URI_LENGTH import in test file from function body
to top-level per repo convention.
The eight resource-template tests added in this PR were placed inside the legacy TestServer and TestContextInjection classes to match surrounding code, but the repo convention is standalone module-level functions. Moved to the bottom of the file alongside the existing standalone tests.
…acks
Adds cases for the fallback paths introduced in the lenient-query
fixes: literal ? in path portion, {?...} expression in path portion,
empty & segments in query string, and duplicate query keys.
_extract_path was dropping all empty segments when splitting an
explode capture, but only the first empty item comes from the
leading operator prefix. Subsequent empties are legitimate values:
{/path*} with ['a', '', 'c'] expands to /a//c and must match back
to the same list.
Split by separator, strip only items[0] if empty, then iterate.
The ; operator is unaffected since empty values use the bare-name
form which is a non-empty segment.
…ator The explode capture pattern ((?:SEP body)*?) means non-empty captures always start with the separator, so split()[0] is always empty. The defensive if-check was a dead branch; slice unconditionally instead.
_split_query_tail enabled lenient matching for page{#section}{?q},
but lenient matching's partition('#') stripped the fragment before
the path regex (which expects #section) could see it, causing
fullmatch to always fail.
Extended the path-portion fallback check to also bail on {#...}
expressions and literal # characters, mirroring the existing ? check.
Such templates are semantically unusual (query-after-fragment is not
valid URI structure) but now round-trip correctly via strict regex.
There was a problem hiding this comment.
No new bugs found and the two issues from my earlier review are fixed, but this is a large feature PR (~1000 lines of new security-sensitive code across uri_template.py and path_security.py) with breaking changes that warrants human review.
Extended reasoning...
Overview
This PR replaces the regex-based resource template matcher with a full RFC 6570 URI template engine (uri_template.py, ~876 lines), adds filesystem path safety primitives (path_security.py, ~156 lines), and wires both into the MCPServer decorator via a configurable ResourceSecurity policy. It also adds 376 lines of new documentation and extensive test coverage (~800 lines across two new test files plus additions to existing test files). The PR claims to close 4 issues and supersede 4 other PRs.
Security risks
The path_security.py module implements path-traversal detection, absolute-path detection, and safe_join — all security-critical primitives. The contains_path_traversal function uses a depth-tracking algorithm rather than a simple substring check, which is the right approach but needs careful human verification. The ResourceSecurity policy is secure-by-default (rejecting traversal and absolute paths), which is good. However, the correctness of these security boundaries is too important for automated approval alone.
The uri_template.py module includes ReDoS protection that rejects quadratic-backtracking patterns at parse time. The design choices here (which patterns to allow vs reject) are architectural decisions that benefit from human review.
Level of scrutiny
This PR requires high scrutiny. It introduces ~1000 lines of new production code in two new modules, contains multiple breaking changes, and touches security-sensitive code paths (path traversal protection, URI parsing for resource routing). The RFC 6570 implementation involves complex regex construction and matching logic that could have subtle edge cases beyond what automated testing catches.
Other factors
The test coverage is thorough (211 tests for URI templates, 50 for path security, plus E2E tests), and the two bugs I found in my previous review were promptly fixed with regression tests. The documentation is comprehensive. However, the breaking changes listed in the migration guide affect existing users and represent design decisions (e.g., static URI + Context-only handler now errors, query params match leniently by default) that a maintainer should explicitly sign off on.
Bare dict return types are now parameterized (dict[str, str] or dict[str, Any] as appropriate). Low-level handler examples now include ServerRequestContext[Any] and PaginatedRequestParams types for the ctx and params parameters, with the corresponding imports added to each code block.
Added a link to the MCP resources specification after the intro.
Rewrote the multi-segment paths section to lead with the problem:
show a URI that fails with {name} before introducing {+name} as the
fix. Code comments align inputs with outputs for at-a-glance parsing.
Rewrote the query parameters section to lead with the two concrete
URIs a user would want to support (base and with-query), then show
how one template covers both.
The adjacency check rejected {+a}{b} but not the symmetric {a}{+b}.
Both produce overlapping greedy quantifiers; a 64KB crafted input
against prefix{a}{+b}.json takes ~23s to reject.
Added prev_path_expr tracking so {+var} immediately after any path
expression is rejected. {expr}{#var} remains allowed since the #
operator prepends a literal '#' that the preceding group's character
class excludes, giving a natural boundary.
Also adds the missing 'from typing import Any' to the three low-level
server examples in docs/server/resources.md.
The regex-based URI template matcher required an ever-growing set of
parse-time adjacency checks to reject templates that would cause
catastrophic backtracking. Python's re module is a backtracking engine,
so any pair of greedy groups with overlapping character classes and a
failing suffix produces O(n^k) match time. Enumerating every such
combination proved intractable — each fix revealed another bypass.
This replaces the regex matcher with a two-ended linear scan:
- Templates are flattened into a sequence of literal and capture atoms,
with operator prefixes/separators lowered to explicit literals.
- A template may contain at most one multi-segment variable ({+var},
{#var}, or explode). This is the only structural restriction.
- The suffix is scanned right-to-left: literals via endswith, bounded
variables via rfind of the preceding literal. This matches regex
greedy-first semantics exactly for templates without a greedy var.
- If a greedy var exists, the prefix is scanned left-to-right with
lazy anchor-finding, and the greedy var gets whatever remains between
prefix_end and suffix_start.
Every URI character is visited O(1) times per atom. There is no
backtracking; a failed anchor search returns None immediately.
Removes _check_ambiguous_adjacency (80 lines of state tracking),
_build_pattern, _expression_pattern, and the _pattern field. Templates
previously rejected for adjacency ({+a}{b}, {a}{+b}, prefix/{+path}{.ext})
are now accepted and match in linear time. The only rejected patterns
are those with two or more multi-segment variables, which are
inherently ambiguous regardless of algorithm.
Adds tests for: - Prefix literal/anchor failures before a greedy var - Greedy scalar containing its own stop-char - Explode span not starting with separator or containing stop-chars - ifemp handling in the left-to-right prefix scan - Adjacent bounded caps in prefix (first-takes-to-stop-char) Also converts the prefix_end > suffix_start check to an assertion: _scan_prefix is bounded by suffix_start so the condition cannot hold.
RFC 6570 expansion never percent-encodes variable names, so a
legitimate match will always have the parameter name in literal form.
Decoding names before the duplicate-key check let an attacker shadow a
real parameter by prepending a percent-encoded duplicate:
api://x?%74oken=evil&token=real -> {token: evil}
With this change the encoded form is treated as an unrecognized
parameter and ignored, so the literal form wins.
A %00 in a URI decodes to \x00, which defeats the traversal check's
string comparison ("..\x00" != "..") and can cause truncation in
handlers that pass values to C extensions or subprocess.
safe_join already rejects null bytes; this closes the defense-in-depth
gap so ResourceSecurity catches them before the handler runs. The
check runs first so it also covers the traversal-bypass case.
ResourceTemplate.matches() previously returned None for both "URI doesn't match this template" and "URI matches but fails security validation". ResourceManager.get_resource iterates templates and uses the first non-None result, so a strict template's security rejection would silently fall through to a later, possibly permissive, template. Registration order became security-critical without documentation. matches() now raises ResourceSecurityError on security failure, halting template iteration at the first rejection. The error carries the template string and the offending parameter name. ResourceSecurity.validate() now returns the name of the first failing parameter (or None if all pass) rather than a bool, so the error can identify which parameter was rejected.
Bundled low-severity hardening:
- Lower DEFAULT_MAX_TEMPLATE_LENGTH from 1MB to 8KB. Real templates
are under 200 bytes; the old limit allowed 0.75s parse times.
- Replace max_expressions with max_variables (default 256). A single
{v0,v1,...,vN} expression packed arbitrarily many variables under
one expression count, bypassing the limit.
- Store UriTemplate internals as tuples. The dataclass is frozen but
list fields were mutable via t._parts.append(), violating the
immutability contract.
- Coerce ResourceSecurity.exempt_params to frozenset in __post_init__
so hash() works even when callers pass a regular set.
- Check drive letters against ASCII only. str.isalpha() is
Unicode-aware, so is_absolute_path("Ω:foo") falsely returned True.
Pydantic's AnyUrl resolves %2E%2E and traversal during validation, so
str(AnyUrl("file:///a/%2E%2E/b")) yields "file:///b". The JSON-RPC
protocol layer uses raw str and is unaffected, but internal callers
wrapping in AnyUrl get silently different security semantics.
The normalisation is mostly protective (resolved paths won't match
templates with fixed prefixes), so this documents the inconsistency
rather than narrowing the signature.
…rence
The R->L scan used rfind to locate the literal preceding a variable.
When that literal is the first atom of the template and its text
appears inside the variable's value, rfind lands on the occurrence
inside the value rather than at position 0, leaving unconsumed
characters and returning None.
UriTemplate.parse("prefix-{id}").match("prefix-prefix-123")
# returned None; regex returns {'id': 'prefix-123'}
For templates without a greedy variable, the atom sequence IS the
whole template, so atoms[0] is positionally fixed at URI position 0.
_scan_suffix now takes an anchored flag: when set, the first-atom
literal anchors at 0 rather than searching via rfind.
Also: adjacent captures now skip the stop-char scan entirely since the
result was discarded (start = pos). This drops the worst-case from
O(n*v) to O(n + v) for the pathological all-adjacent-vars case
(497ms -> 2ms for 256 vars against 64KB), and the module docstring
now states the complexity accurately.
- migration.md: path-safety checks now raise ResourceSecurityError rather than silently falling through; null bytes are rejected by default; templates may have at most one multi-segment variable - resources.md: add reject_null_bytes to the settings table; note that ResourceSecurity is a heuristic and safe_join remains the containment boundary
The previous matcher was a naive replace('{', '(?P<').replace('}',
'>[^/]+)') that threw re.error on any operator character. Removed
items describing constraints on features that did not exist in v1.x:
- 'At most one multi-segment variable': {+var}/{#var}/explode all
threw re.error in v1.x, so nobody had a working template with one
let alone two. Covered in resources.md.
- 'Query parameters match leniently': {?q} also threw re.error. The
lenient-query feature is new, not a behavior change.
Also folded the structural-delimiter change into the literals item
and softened 'malformed templates' to note it's an error-timing
change (re.error at match time -> InvalidUriTemplate at decoration).
Passing a mutable set to a frozen dataclass and expecting hash() to work is a caller error, not something the SDK needs to defend against.
One logic fix and a sweep of stale references left over from the
regex-to-scan rewrite.
ifemp round-trip (_scan_prefix): the name-continuation guard rejected
the empty-value case when the template's next literal started with a
non-stop-char. api{;key}X{+rest} with key='' expands to api;keyX/tail
but matched None because 'X' after ;key was treated as a name
continuation. Now checks whether the next literal starts at the
current position before rejecting.
Doc/style cleanups:
- match() docstring: 'regex derived from the template' -> 'linear scan'
- _split_query_tail: 'strict regex' -> 'strict scan'
- test comments: 5x 'regex' -> 'scan'
- DEFAULT_RESOURCE_SECURITY: docstring now mentions null-byte rejection
- migration.md: describe client-visible 'Unknown resource' error rather
than the internal ResourceSecurityError type
- _Atom type alias: remove unnecessary string quoting
- UriTemplate fields: list[...] not tuple[..., ...] — arbitrary-sized
tuples are not a defence worth having
|
putting this back as a draft as idk if the maintenance burden is worth it. This is defined in the MCP spec as RFC 6570 but like... RFC 6570 technically only defines expansion and not matching. So in reality it's awkward for us to be setting resources as strings in the form of RFC 6570 and then the SDKs somehow have to be able to match a So idk dude, I'm going to draft this for now and figure out:
either way, will leave as draft for a few days and think through it. |
Replaces the regex-based resource template matcher with a linear-time RFC 6570 implementation, adds configurable path-safety validation, and ships a standalone
UriTemplateutility usable from the low-level server.Motivation and Context
The existing implementation supports only RFC 6570 Level 1 (simple
{var}expansion). Users writingfile://{+path}to match multi-segment paths hit a confusing "mismatch" error because the decorator's regex-based parameter extraction doesn't understand the+operator. The TypeScript, C#, and Go SDKs all support Level 3 or higher.The matcher also has a regex-escaping gap: template literals like
.become wildcards, sodata://v1.0/{id}incorrectly matchesdata://v1X0/42.Separately, path traversal via encoded slashes (
..%2Fetc%2Fpasswd) is a known concern for resource handlers that touch the filesystem. This PR adds a configurable pre-validation layer with a decode-once-validate-in-handler model, aligned with the Kotlin SDK's documented contract.Closes #436, closes #220, closes #159, closes #378
Supersedes #197, #427, #1439, #2353
Architecture
Three layers:
mcp.shared.uri_template.UriTemplate— standalone RFC 6570 engine supporting Levels 1-3 plus path-style explode ({/var*},{.var*},{;var*}). Providesparse(),expand(), andmatch().The matcher uses a two-ended linear scan rather than regex: templates are flattened into literal and capture atoms, the suffix is scanned right-to-left via
rfind, and if a greedy variable ({+var},{#var}, or explode) is present the prefix is scanned left-to-right with the greedy var claiming the gap. Every URI character is visited O(1) times per atom. There is no backtracking and therefore no ReDoS surface — a property Go and C# get for free from their linear-time regex engines but Python'sremodule cannot provide. The only structural restriction is at most one multi-segment variable per template, which is the genuine ambiguity floor.Query expressions (
{?q,lang}) are matched leniently via a manual&/=split with RFC 3986 decoding: order-agnostic, partial params accepted, extras ignored,+treated as literal (not space). Absent params stay absent so Python function defaults apply naturally.mcp.shared.path_security—contains_path_traversal(),is_absolute_path(),safe_join()primitives.safe_joinresolves the joined path and verifies it stays within a base directory, catching symlink escapes and null bytes.ResourceSecuritypolicy — configurable dataclass wired intoMCPServer(resource_security=...)and@mcp.resource(..., security=...). Defaults reject..sequences that escape the starting directory, absolute-path values, and null bytes. Per-parameter exemptions viaexempt_params={"name"}. Security rejections raiseResourceSecurityErrorto halt template iteration — a strict template's rejection does not silently fall through to a later permissive template.How Has This Been Tested?
tests/shared/test_uri_template.pycovering parsing, expansion, matching, round-trip, lenient query matching, and parse-time validationtests/shared/test_path_security.pycovering traversal detection, absolute-path detection, andsafe_joinincluding symlink escapesClientverifying the@mcp.resourcedecorator with all operators,ResourceSecuritypolicy enforcement, function-default fallthrough for optional query params, and the security-rejection-halts-iteration contract%252F), encoded backslash (%5C), encoded dots (%2E%2E), null bytes (%00including..%00defeating string comparison), query name collision via percent-encoding (%74oken), multi-param with one poisoned value{a}{b}X,{+a}{b},{/a*}/L{/b*}/M, chained explodes) tested for both correctness and linear scaling at large input sizesBreaking Changes
.., contain a null byte, or look like absolute paths are rejected by default. The client receives "Unknown resource" and template iteration stops. Configure viaResourceSecurityto opt out..in the template matches only a literal dot; simple{var}stops at?,#,&,,where it previously swallowed them.InvalidUriTemplateat decoration time instead of silently misbehaving or raisingre.erroron first match.Contextnow errors at decoration time. This pattern was previously silently broken (resource registered but unreachable).DEFAULT_MAX_TEMPLATE_LENGTHlowered from 1 MB to 8 KB. Real templates are under 200 bytes; the previous limit allowed parse-time resource exhaustion.See
docs/migration.mdfor migration guidance.Types of changes
Checklist
Additional context
Why not regex? Python's
reis a backtracking engine. Any pair of greedy groups with overlapping character classes and a failing suffix produces polynomial match time. Go'sregexp(RE2) and C#'sRegexOptions.NonBacktrackingguarantee linear time and are immune by construction; Python has no equivalent before 3.11, and even 3.11's atomic groups break legitimate{+path}.txtbacktracking. Enumerating every dangerous pattern at parse time proved intractable — each fix revealed another bypass. The linear scan eliminates the class of problem.UriTemplate.match()decodes once and returns values as-is. This aligns with the Kotlin and C# SDKs and with the MCP spec's own canonicalfile:///{path}example. Path-safety validation lives inResourceSecurity(configurable pre-checks) andsafe_join(the resolve-and-verify check for filesystem handlers).The
;path-param operator supports full expand/match round-trip including the RFC §3.2.7ifempedge case — the only SDK implementation that does.New documentation at
docs/server/resources.mdcovers template syntax, security configuration, and low-level server usage.AI Disclaimer